智能论文笔记

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

Qihuang Zhong , Liang Ding , Yibing Zhan , Yu Qiao , Yonggang Wen , Li Shen , Juhua Liu , Baosheng Yu , Bo Du , Yixin Chen

分类：自然语言处理

2022-12-04

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.

translated by 谷歌翻译

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Qiyue Yin , Tongtong Yu , Shengqi Shen , Jun Yang , Meijing Zhao , Kaiqi Huang , Bin Liang , Liang Wang

分类：机器学习 | 人工智能

2022-12-01

With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

translated by 谷歌翻译

ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

Zhengyong Huang , Sijuan Zou , Guoshuai Wang , Zixiang Chen , Hao Shen , Haiyan Wang , Na Zhang , Lu Zhang , Fan Yang , Haining Wangg

分类：计算机视觉

2022-11-04

Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. Therefore, it is of great significance to develop a method that can automatically segment tumor target regions. In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information. In addition, our network uses dual-channel inputs in the coding stage and fuses them in the decoding stage, which can take advantage of the differences and complementarities between PET and CT. We validated the proposed ISA-Net method on two clinical datasets, a soft tissue sarcoma(STS) and a head and neck tumor(HECKTOR) dataset, and compared with other attention methods for tumor segmentation. The DSC score of 0.8378 on STS dataset and 0.8076 on HECKTOR dataset show that ISA-Net method achieves better segmentation performance and has better generalization. Conclusions: The method proposed in this paper is based on multi-modal medical image tumor segmentation, which can effectively utilize the difference and complementarity of different modes. The method can also be applied to other multi-modal data or single-modal data by proper adjustment.

translated by 谷歌翻译

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Chengzhi Lin , Ancong Wu , Junwei Liang , Jun Zhang , Wenhang Ge , Wei-Shi Zheng , Chunhua Shen

分类：计算机视觉 | 自然语言处理

2022-09-27

视频和文本之间的跨模式检索因网络上的视频迅速出现而越来越多。通常，视频包含丰富的实例和事件信息，查询文本仅描述了信息的一部分。因此，视频可以对应于多个不同的文本说明和查询。我们将此现象称为``视频文本对应歧义''问题。当前技术主要集中于挖掘视频和文本内容之间的本地或多级对齐（\ textit {e.g。}，对实体和动词的动作对象）。这些方法很难通过仅使用一个单个功能来描述视频来减轻视频文本的歧义，这需要同时与多个不同的文本功能匹配。为了解决这个问题，我们提出了一个文本自适应多个视觉原型匹配模型，该模型会自动捕获多个原型，以通过自适应聚合视频令牌功能来描述视频。给定查询文本，相似性由最相似的原型确定，以在视频中找到对应关系，该视频称为文本自适应匹配。为了学习代表视频中丰富信息的多种原型，我们提出了差异损失，以鼓励不同的原型参与视频的不同内容。我们的方法在四个公共视频检索数据集上优于最先进的方法。

translated by 谷歌翻译

Multi-dataset Training of Transformers for Robust Action Recognition

Junwei Liang , Enwei Zhang , Jun Zhang , Chunhua Shen

分类：计算机视觉

2022-09-26

我们研究了可靠的功能表示的任务，旨在在多个数据集上良好地概括以进行行动识别。我们建立了有关变形金刚的功效的方法。尽管在过去的十年中，我们目睹了视频动作识别的巨大进展，但如何培训单个模型可以在多个数据集中表现良好的单一模型仍然充满挑战而有价值。在这里，我们提出了一种新颖的多数据集训练范式，Multitrain，设计了两个新的损失条款，即信息丰富的损失和投射损失，旨在学习稳健的表现以进行行动识别。特别是，信息性损失最大化了功能嵌入的表现力，而每个数据集的投影损失遍历了数据集的类之间的内在关系。我们验证方法对五个具有挑战性的数据集的有效性，即动力学400，动力学700，矩矩，活动网络和某种效果 - v2数据集。广泛的实验结果表明，我们的方法可以始终如一地提高最新性能。

translated by 谷歌翻译

Tensor-Based Multi-Modality Feature Selection and Regression for Alzheimer's Disease Diagnosis

Jun Yu , Zhaoming Kong , Liang Zhan , Li Shen , Lifang He

分类：机器学习 | 计算机视觉

2022-09-23

与大脑变化相关的阿尔茨海默氏病（AD）和轻度认知障碍（MCI）的评估仍然是一项艰巨的任务。最近的研究表明，多模式成像技术的组合可以更好地反映病理特征，并有助于更准确地诊断AD和MCI。在本文中，我们提出了一种新型的基于张量的多模式特征选择和回归方法，用于诊断和生物标志物对正常对照组的AD和MCI鉴定。具体而言，我们利用张量结构来利用多模式数据中固有的高级相关信息，并研究多线性回归模型中的张量级稀疏性。我们使用三种成像方式（VBM- MRI，FDG-PET和AV45-PET）具有疾病严重程度和认知评分的临床参数来分析ADNI数据的方法的实际优势。实验结果表明，我们提出的方法与疾病诊断的最新方法的优越性能以及疾病特异性区域和与模态相关的差异的鉴定。这项工作的代码可在https://github.com/junfish/bios22上公开获得。

translated by 谷歌翻译

ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery

Lanqing Li , Liang Zeng , Ziqi Gao , Shen Yuan , Yatao Bian , Bingzhe Wu , Hengtong Zhang , Chan Lu , Yang Yu , Wei Liu

分类：机器学习 | 人工智能

2022-09-16

在过去的十年中，AI AID毒品发现（AIDD）的计算方法和数据集策划的繁荣发展。但是，现实世界中的药物数据集经常表现出高度不平衡的分布，这在很大程度上被当前的文献忽略了，但可能会严重损害机器学习应用程序的公平性和概括。在这一观察结果的激励下，我们介绍了Imdrug，这是一个全面的基准标准，其开源python库由4个不平衡设置，11个AI-Ready数据集，54个学习任务和16种为不平衡学习量身定制的基线算法。它为涵盖广泛的药物发现管道（例如分子建模，药物靶标相互作用和逆合合成）的问题和解决方案提供了可访问且可定制的测试床。我们通过新的评估指标进行广泛的实证研究，以证明现有算法在数据不平衡情况下无法解决药物和药物挑战。我们认为，Imdrug为未来的研究和发展开辟了途径，在AIDD和深度不平衡学习的交集中对现实世界中的挑战开辟了道路。

translated by 谷歌翻译

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

Changtong Zan , Liang Ding , Li Shen , Yu Cao , Weifeng Liu , Dacheng Tao

分类：自然语言处理

2022-09-07

文本表示的预培训（PT）已成功应用于低资源神经机器翻译（NMT）。但是，它通常无法在资源丰富的NMT上获得显着的收益（有时甚至更糟），与其随机定位（RI）对应物相当。我们迈出了第一步，通过两个探测分析来研究资源丰富的场景中PT和RI之间的互补性，并发现：1）PT并不提高准确性，而是通过实现平坦的损失景观而不是RI的概括。 2）PT不是提高词汇选择的信心，而是通过分配更平滑的词汇概率分布而不是RI的词汇分布来提高词汇选择的信心。基于这些见解，我们建议将它们的互补性与模型融合算法相结合，该算法利用最佳传输来对齐PT和RI之间的神经元。对两个资源丰富的翻译基准的实验，WMT'17英语 - 中国（20m）和WMT'19英语 - 德国人（36m），表明PT和RI可以彼此很好地互补，可以实现实质性的改进，考虑到这两个翻译准确性，考虑到同时的翻译准确性，概括和负多样性。探测工具和代码的发布：https：//github.com/zanchangtong/ptvsri。

translated by 谷歌翻译

Adversarial Camouflage for Node Injection Attack on Graphs

Shuchang Tao , Qi Cao , Huawei Shen , Yunfan Wu , Liang Hou , Xueqi Cheng

分类：机器学习

2022-08-03

节点注入对图神经网络（GNN）的攻击已作为一种实际的攻击场景而引起了人们的注意，攻击者会注入恶意节点，而不是修改节点功能或边缘以降低GNN的性能。尽管节点注射攻击最初取得了成功，但我们发现，通过防御方法，可以通过防御方法和限制其在实践中限制其攻击性能，从而很容易将注射的节点与原始正常节点区分开。为了解决上述问题，我们致力于伪装节点注入攻击，即伪装注入恶意节点（结构/属性）是对防御方法似乎合理/不察觉的普通淋巴结。图形数据的非欧亚人性质和缺乏人类的先验性质给伪装上伪装的形式化，实施和评估带来了巨大挑战。在本文中，我们首先提出并制定了从注射节点围绕的自我网络的忠诚度和多样性中注入的节点的伪装。然后，我们为节点注射攻击（即Cana）设计了一个对抗性伪装框架，以改善伪装，同时确保攻击性能。进一步设计了几种用于图形伪装的新型指标，以进行全面的评估。实验结果表明，当将现有的节点注入攻击方法与我们提出的CANA框架配置时，针对防御方法的攻击性能以及节点伪装将显着改善。

translated by 谷歌翻译

On Mitigating Hard Clusters for Face Clustering

Yingjie Chen , Huasong Zhong , Chong Chen , Chen Shen , Jianqiang Huang , Tao Wang , Yun Liang , Qianru Sun

分类：计算机视觉

2022-07-25

面部聚类是使用大型未标记的面部图像扩展面部识别系统的一种有希望的方法。识别我们称之为硬群的小或稀疏的面部图像簇仍然具有挑战性，这是由簇的异质性，\ ie，大小和稀疏性的高变化引起的。因此，使用均匀阈值（识别簇）的常规方式通常会导致对应该属于硬群的样品的可怕分类。我们通过利用样品的邻居信息并以概率方式推断（样本的）群集成员来解决这个问题。我们介绍了两个新型模块，分别是基于邻域扩散的密度（NDDE）和基于过渡概率的距离（TPDI），我们可以简单地将标准密度峰值聚类算法应用于均匀的阈值。我们对多个基准测试的实验表明，每个模块都会有助于我们方法的最终性能，并通过将其纳入其他高级面部聚类方法中，这两个模块可以将这些方法的性能提高到新的最先进。代码可在以下网址获得：https：//github.com/echoanran/on-mitigating-hard-clusters。

translated by 谷歌翻译